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COMPARISON  OF  THE  VARIANCES  OF  THE  NET  AND  GROSS  ERROR  RATES 

Max  A.  Bershad* 


The  purpose  of  this  note  is  to  have  a 
readily  available  record  of  the  variance  of 
the  gross  error  rate  and  the  net  error  rate. 

A  simple  random  sample  of  n  elements  from 
a  very  large  population  has  been  taken.   For 
each  sample  element  two  independent  measure- 
ments have  been  made  by  the  same  method  of 
measurement.   The  measurements  are  made  on  a 
0,  1  variable.   The  results  of  the  two 
measurement  processes  can  be  displayed  as 
follows : 


Second  Measurement 


First 
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The  expected  value  of  repeated  measurements  on 
the  same  element,  j,  over  trials  is  denoted  by 
X.  .   The  response  deviation  for  the  j-th 

J  • 

element  on  the  t-th  trial  is  tnen  (x   -  X.  ). 

J"0    3  • 

Response  deviations  of  different  elements  are 
assumed  to  be  uncorrelated. 

The  proportion  of  sample  elements  for 
which  the  first  measurement  is  "1"  and  the 
second  measurement  is  "0"  is  b/n  and  will  be 
denoted  by  p,  ;  and  the  proportion  for  which 


*  The  final  version  was  prepared  after  the 
author's  death  by  Barbara  A.  Bailar. 


the  first  measurement  is  "0"  and  the  second  is 
"1"  is  c/n  and  will  be  denoted  by  p  .  Thus, 
the  gross  error  rate  is  estimated  by  p,  +  p 
while  the  net  error  rate  is  estimated  by 
p,  -  p  .  The  variance  of  the  gross  error  rate, 


Pb+  P 
rate,  a 


,  and  the  variance  of  the  net  error 

,  differ  only  in  the  sign  of  the 


c 

Pb  "  Pc 


last  term,  i.e. , 


Pb+Pc     V 


Pu-P  Pi, 

Jb  rc    ^b 


P^P 

*D  C 


2  a 


P-uP 


Thus,  we  can  write  these  variances  as 


P-u  +  P 
*D  —  rC 


PC" 


2  o- 


Pbpc 


=  2  (a2  +  a    ) 
Pb"  Pbpc 


*"b    pc 

Since  we  are  dealing  with  a  0,  1  variable, 

we  can  consider  the  four  variables  displayed 
in  the  2x2  table  to  have  a  multinomial  dis- 
tribution.  Thus,  cov  (b,c)  =  o       =  _n  P^Po, 
'       '      be       B  C 

and  since  we  assume  simple  random  sampling, 


cov  (p  ,  p  )  =  a 


-P  P 
B  C 


b 


PbPc 


Since  the  two  trials  are  assumed  to  be  inter- 


changeable, P  =  Pp.  Thus, 


7  -   

Pv.P     n 
rbrc 


</9 
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Also,  from  the  multinomial  distribution  we 
find  that 

Po     U-PJ 


Thus, 


Pb 


PbiPc       n 


=  4  CCp.  -PS)  +  CjS)] 


But  (X  )  (l  -  X  )  is  the  total  variance, 

a2,  for  a  0,  1  variable,  and  o2  =  a2  +  a2, 

R    S 

where  a,2  denotes  the  sampling  variance  and 

°R  =  E  ^Xit  "  Xi  )Sdenotes  "the  simple  re- 
sponse  variance,  the  expectation  being  taken 
over  all  trials  and  samples. 
Thus, 


To  determine  E  (p,  )  =  P  ,  we  note  that 

1  n 

p,  =  —  Ex.,  (l  -  x.,  ,)  where  x.,  is  the 
Hd   n  ,   it       if        it 


J 


measurement  on  the  j-th  element  on  the  first 

trial  and  x.,  ,  is  the  measurement  on  the  sec. 
J*1 

ond  trial,  each  measurement  being  a  "0"  or  a 
"1".   The  conditional  expected  value  of  p, 

for  the  given  sample  over  all  trials  is 

1  n     - 

-  E   (X.    )    (1  -  X.    ).      Then,    the  expected 

value  over  all  samples  is 


1 


N     _ 


E(pb)=^(X.)(l.X.) 


Then, 


and 


while 


E   (Pb) 


rb  —  *c 


Pb-Pc 


I  C(<tf-«tf)±C-tf>] 


=  I  [°R  ] 


Pk    +    P, 


S*tf-2<tfl 


l  N  -        l  N  - 
=  in.     _  ^  E  X2. 


=  X       -   (a*  +  X2    ) 


where  a 2  is   the  sampling  variance  defined  by 


a 2  =  E  [X .   -  X  ]2  =  E  X2   -  X2 
S     "  J.  J. 


Thus, 


E  (pb)  =  (X^  -  X2^)  -  a* 


(X  )  (1  -  X  )  -  a 2 

•  •  •  •         O 


Thus,  if  the  conditions  of  the  second 
paragraph  apply,  the  variance  of  the  gross 
error  rate  is  the  variance  of  the  net  error 
rate  less  n  times  the  square  of  the  variance 
of  the  net  error  rate.  When  the  simple  re- 
sponse variance  is  small,  the  variance  of  the 
gross  and  net  error  rates  are  almost  the  same, 
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EFFECT  OF  NON  RESPONDENTS  ON  ESTIMATES 
OF  SALES  TREND 

Max  A.  Bershad* 


The  problem  below  occurs  in  many  forms 
particularly  with  respect  to  non-response  and 
cut-off  surveys. 

THE  PROBLEM 

A  sales  trend  (l+Ai)  is  computed  from  the 
reports  of  a  select  group  (say  respondents) 
that  covers  a  given  proportion  (p)  of  the 
total  sales  of  the  universe.   What  must  be 
the  sales  trend  (I+A3)  for  the  non-selected 
group  covering  the  remainder  of  the  universe, 
if  the  sales  trend  (l+A)  for  the  universe  is 
to  fall  within  a  given  percentage  (+e)  of  the 
trend  for  the  select  group? 
THE  SOLUTION 

(l+As)  =  (1+AiHl  ±|)    where  q=l-p 
AN  ILLUSTRATION 

A  reporting  group  with  85  percent  coverage  of 
an  industry's  volume,  reports  an  increase  of 
25  percent.  What  is  the  allowable  range  for 


the  balance  of  the  industry  stich  that  the  true 
trend  for  the  industry  shall  be  within  5 
percent  of  1.25  (or  an  increase  of  19  per- 
cent to  31  percent) ? 

(1+A3)  =  1.25  (1  ±  3^)  =  1.25  (1  +  .33) 
=   1.25  +  .4-2 

The  balance  of  the  industry  can  there- 
fore have  a  trend  of  anywhere  from  .83  to 
1.67.   In  other  words,  the  balance  would 
have  to  drop  more  than  17  percent  or  increase 
more  than  67  percent  before  the  reporting 
group  would  not  reflect  the  true  answer  within 
5  percent. 

The  following  table  presents  values  of 
(1  +  Aa)  for 

(i)  values  of  (l+Aj,  from  .70  to  1.50  and 
(ii)  values  of  p       from  1,0%   to  95$  and 
(iii)  values  of  e       of  .5%,   1%,    2%,    5% 


-The  final  version  was  prepared  after  the 
author's  death  by  Michael  Berry.   In  the 
original  draft  the  title  was  Algebraic  Note. 
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A  PROPOSAL  FOR  OBTAINING  VARIANCE  COMPUTATIONS 
AS  A  BY-PRODUCT  OF  REGULAR  TABULATIONS 

Margaret  Gurney 


INTRODUCTION 

A  common  method  for  computing  variances  of 
estimates  made  from  sample  data  is  to  divide 
the  sample  observations  into  random  groups, 
and  to  use  the  totals  of  the  several  random 
groups  as  a  basis  for  computation. 

A  procedure  used  extensively  at  the  Bureau 
of  the  Census  permits  the  use  of  the  standard 
tabulation  programs  (written  to  tabulate  the 
sample)  as  a  devic'e  for  obtaining  the  random 
group  totals  which  are  needed  to  compute 
variances. 

ASSIGNMENT  TO  RANDOM  GROUPS 

When  the  records  to  be  tabulated  are  edited, 
weighted,  and  ready  to  tabulate,  they  are  put 
into  a  sort  which  reflects  the  sample  design. 
If  multi-stage  sampling  is  used,  the  whole 
sample  within  a  first  step  unit  should  be  as- 
signed to  the  same  random  group.   The  reliabil- 
ity of  the  estimates  of  the  sampling  variance 
depends  on  the  number  of  random  groups  used 
and  on  the  number  of  sample  elements  within 
each  random  group.   The  considerations  leading 
to  a  choice  of  the  number  of  random  groups  are 
given  in  HHM,  Vol.  I.1  For  the  purpose  of  the 
discussion  which  follows,  however,  we  use  10 


random  groups  for  illustrative  purposes. 

According  to  the  rules  established,  each 
record  is  assigned  to  one  of  the  10  random 
groups,  and  the  random  group  number  is  recorded 
in  an  appropriate  field  of  the  record.   Each  of 
the  10  groups  will  have  approximately  one- 
tenth  of  the  original  records  in  it  (some 
variability  will  arise  if  clusters  of  units 
are  assigned  as  a  whole  to  a  single  group) . 
Each  of  the  10  random  groups  will,  indeed,  be 
a  miniature  of  the  whole  sample,  and  the  random 
group  totals,  when  multiplied  by  10  will  pro- 
duce estimates  similar  to  those  to  be  obtained 
from  the  whole  sample.   The  variance  between 
the  10  random  group  estimates  can  be  used  to 
represent  the  variance  of  the  estimate  from 
the  whole  sample. 

TABULATION 

The  records  are  sorted  for  a  particular 
tabulation,  with  an  additional  sort  on  random 
group  as  the  most  minor  sort.   For  example, 
for  a  tabulation  of  farm  workers,  a  tabulation 
by  region,  type  of  farm,  economic  class  of 
farm,  sex  and  age  of  the  farm  operator  may  be 


1Hansen,  Hurwitz  and  Madow,  Sample  Survey 
Methods  and  Theory,  Vol.  I.,  p  LMX. 


desired.   If  the  tally  matrix  provides  cells 

for  tallying  economic  class  of  farm,  sex  of 

operator,  and  age  of  operator,  the  only  sorts 

needed  for  the  tabulation  of  the  sample  would 

be  by  region  and  type  of  farm.   To  obtain 

random  group  totals,  an  additional  minor  sort 

by  random  group  is  required,  so  that  the  sort 

would  be 

region 

type  of  farm 
random  group 

The  tallying  puts  the  records  in  the  cells 
corresponding  to  economic  class  of  farm,  sex, 
and  age,  in  the  conventional  way,  throughout 
the  first  random  group.  When  the  last  record 
of  random  group  1  has  been  tallied,  the  tally 
matrix  is  written  out  on  an  auxiliary  "Random 
Group  Tape."  The  tally  matrix  is  not  cleared 
at  this  point. 

The  records  in  random  group  2  are  then 
tallied  in  the  same  matrix,  and  the  cumulative 
total  of  random  groups  1  and  2  is  written  out 
on  the  Random  Group  Tape;  the  tally  matrix  is 
not  cleared. 

The  procedure  is  continued,  with  a  write- 
out  at  the  end  of  each  random  group,  until 
the  Random  Group  Tape  has  the  final  write-out, 
which  is  the  cumulative  sum  of  the  10  random 
groups  for  the  tally  matrix.   At  this  time 
the  tally  matrix  contains  the  table  which  is 
desired  for  publication  of  estimates  from 


the  whole  sample-;  this  can  be  written  on 
another  tape,  and  prepared  for  printing,  in 
the  usual  way. 

UNSCRAMBLING  OF  "RANDOM  GROUP  TAPE" 

The  Random  Group  Tape  will,  of  course  need 
to  have  sufficient  identification,  so  that 
the  cells  in  each  write-out  of  the  tally  matrix 
can  be  identified.   It  is  then  a  simple  matter 
to  obtain  the  totals  for  a  single  random  group 
by  subtraction:  for  example,  random  group  7 
is  obtained  by  subtracting  the  cumulative 
totals  for  groups  1  through  6  from  those  for 
groups  1  through  7. 

During  the  "unscrambling"  process,  a 
selection  of  items  may  be  made,  if  it  is  not 
necessary  to  compute  variances  for  every 
cell  of  every  table.   Moreover,  data  in  a 
suitable  combination  of  cells  may  be  added, 
within  each  random  group,  and  variances  may 
be  computed  for  subtotals  and  totals. 
VARIANCE  COMPUTATION 

The  computation  of  variances  using  random 
group  totals  is  well-known  procedure; 
reference  may  be  made  to  HHM,  Vol  I.,  page  444.. 
If  the  estimation  procedure  involves  the  use 
of  ratio  adjustments  or  composite  estimators, 
these  procedures  should  be  applied  to  each 
random  group  separately,  before  the  variance 
is  computed.   Otherwise  the  effect  of  these 
features  of  the  estimation  procedure  will 
not  be  reflected  in  the  variance  computation. 


ADVANTAGES  OF  THE  PROPOSAL 

Two  major  advantages  of  the  proposal 
described  above  are: 

1.  It  does  not  (appreciably)  affect  the 
running  time  of  the  tabulations 
required  for  publication. 

2.  It  does  not  require  the  preparation 
of  special  tabulation  programs,  or  an 
early  selection  of  items  for  which 
variances  are  desired. 


The  additional  work  involved  in  getting 
the  random  group  code  into  the  record  is 
nominal,  and  may  be  done  as  part  of  some  edit, 
or  preliminary  (test)  tabulation  run. 

The  sort  by  random  group  is  a  minor  sort, 
and  follows  all  other  characteristics  in  the 
sort  required  for  any  particular  tally  matrix. 
Hence  it  requires  very   little  additional  in 
the  way  of  programming  and  running  time. 


THE  DIFFERENCE  BETWEEN  THE  MEDIAN  OF  A  DIFFERENCE 
AND  THE  DIFFERENCE  OF  THE  MEDIANS 


Nigel  F.  Nettheim 


THE  PROBLEM 

Consider  a  population  or  sample,  finite 
or  infinite,  on  which  two  measurements  are 
defined: 

X  =   Xz,  Xa,  ...  and  Y  =  I1}  Y3 ,  ... 
Write  X-Y  =  X^-Ix,  X2-Y2,  ...  and  write 
med(X)  for  the  median  of  X;  then  it  is 
desired  to  study  the  relationship  between 
med(X-Y)  and  med(X)-med(Y) . 

This  problem  arose  in  studying  human 
life  cycles;  for  example,  Y  may  be  the  age 
at  first  marriage  and  X  may  be  the  age  of  a 
mother  at  the  birth  of  her  first  child; 
then  if  med(X)  and  med(Y)  are  available  it 
may  be  desired  to  make  inferences  about 
med(X-Y),  that  is,  the  median  time  interval 
between  first  marriage  and  birth  of  first 
child. 

The  difference  between  the  medians  could 
in  some  cases  be  a  perfectly  good  estimate 
of  the  median  of  the  differences;  for  example, 
let 

X  =  4,  6,  8  and  Y  -  J ,  2,  3; 
then  X-Y  =  3,  4.,  5  and  we  have  med(X-Y)  =  4. 
as  well  as  med(X)-med( Y)  =  6-2  =  4. 

In  other  cases  the  difference  between  the 


medians  could  be  a  very  poor  estimate;  for 
example,  let 

X  =  150,  200,  101  and  1=1,  50,  100; 
then  X-Y  =149,  150,  1  and  we  have  med  (X-Y) 
=  149  by  contrast  with  med (X) -med (Y) 
=  150-50  =  100. 

AN  APPROXIMATE  SOLUTION 

If  the  random  variables  X,  Y  were  normally 
distributed,  then  X-Y  would  also  be  normal, 
and  we  would  have  med(X-Y)  =  med(X)  -  med(Y). 
It  is  desired  to  consider  the  case  of  non- 
normality,  especially  when  X  and  Y  are  not 
independent.   In  the  demographic  example 
(which  is  our  motivation) ,  a  key  feature  is 
the  skewness  (standardized  third  moment)  of 
the  distributions;  this  can  be  Introduced 
by  using  the  Gram-Charlier  series  expansion 
for  frequency  functions  (see  e.g.  Kendall  and 
Stuart  Vol.  I,  p.  156)  up  to  the  third  order 
term.   It  is  true  that  the  fourth  and  higher 
moments  could  also  differ  from  those  of  the 
normal  distribution,  and  that  some  of  these 
moments  could  be  taken  into  account  by  an 
extension  of  the  method  to  be  used  below 
but,  since  this  would  greatly  increase  the 
complexity  of  the  resulting  formulae  and  the 


number  of  parameters  to  be  estimated,  it 
should  be  hoped  that  a  sufficient  indication 
of  the  behavior  of  the  estimates  will  be 
achieved  without  referring  to  such  higher 
moments . 

It  is  shown  by  J.B.S.  Haldane  (194-2, 
p.  296)  that,  to  the  order  of  approximation 
just  mentioned, 

med(X)  =  Hi  (X)  -  I  u.3(X)/u.2(X)        (l) 
where  M-j  (X)  is  the  mean  of  X,  and  \i   (X) 
=  E(X-^i    (X))   is  the  rth  central  moment  of 
X.   Then  also 

med(Y)  =f  m-i  U)  -  \  VoWM?)  (2) 

and  med(X-Y)  ^  ^   (X-l)  -  |  m,3  (X-Y)/n2  (X-Y).(3) 
Using  the  relations 

^   (X-Y)  =  nx  (X)  -  Hj  '(I)  (4) 

Ms  (X-Y)  =  u.3  (X)  +  n3  (Y) 


-  2P(X,Y)  -A>2(X)fe(Y) 
Ma  (X-Y)  -  Ma  (X)  -  n3  (Y) 


(5) 


+  3P(X,Y3)   V^WteCl2) 

-  3P(X2,Y)  -\J^s(X2)^iY) 

r— 

+  6p(X,Y)   VMX)  fe(Y)  [h,.'  (X) 

-  M-x  (Y)] 


(6) 


where  p(X,Y)  Is  the  correlation  between  X  and 
Y,  we  find  the  approximate  relationship 
med(X-Y)  =  med(X)  -  med(Y) 

+  |  (i3(X)/n2(X)  -  I  MY)/n2(Y) 


-  I  {n3(X)  -  te(l) 


+  3P(x,y2)  -^MxTMY2) 

-  3P(X3,Y)  V^(X2)H3(Y) 


+  6p(X,Y)   -yu.3(X)u.2(Y) 

-  med(Y)  +  |MX)/V2(X) 

-  ^(I)/Mb(I)J}/  (^a(X)  +  te(D 

-  2P(X,Y)  ^/fe(X)|i2(Y)  }.  (7) 

SPECIAL  CASES 

In  general  the  result  (7)  involves  not 

3/ 

only  the  skewness  (Ma/Ha   )  °f  'the  two  distri- 
butions but  also  their  variances,  and  it 
involves  not  only  the  correlation  between  the 
variables  but  also  the  correlation  between 
each  variable  and  the  square  of  the  other;  it 
may  also  be  noted  that  p,2(X2)  =   p,4  (X)-|a|  (X) . 
It  may  be  worthwhile  giving  the  simplified 
formulae  which  arise  in  several  special  cases, 
as  follows. 

(i)   If  X  and  Y  are  Independent  (which 
will  not  be  the  case  in  the  demographic 
application  mentioned  above)  we  have 

3d  (X-Y)  *  med(X)  -  med(Y)  +  |  l^Ws 


met 


fe(Y)  _fe(X)  -  to(Y) 
lig(Y)   jjg(X)  +  Pb(Y)  J 


(8) 


(ii)   If  X  and  Y  are  independent  and  also 
Mb(X)  =  fe(Y)  =1  then 

med(X-Y)  *  med(X)  -  med(Y) 

+  j|(MX)  -  hi  (I)).  (9) 
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(iii)      If   M-a(X)   =  Mb  (I)   =  1   then 

med(X-Y)=[med(X)-med(Y)][l-|p(X,Y)/(l-p(X,Y))] 
+  |[Pa  (X)-Ma  (Y)  ]-j|CC  Ma  00 -Ma  (Y)  ][l+p(X,  Y)  ] 


+  3p(X,Ys)VT^(Y^y-3p(Xs  ,J)j^¥)}/ 

(l-p(X,Y)}  (10) 

(iv)      If  Ma(l)  =   Ma  (I)   and  Ms  (X)   =  m2(I) 
=  1  then 

med(X-Y)=[med(X)-med(Y)  ]{l-|p(X,Y)/tl-p(X,Y)  ]} 

-  ^{p(x,Ys)v^(Fr-p(x2,Y)vi^(FT}/ 


[i-p(x,i)3 

ESTIMATION  OF  PARAMETERS 


(11) 


In  order  to  make  use  of  the  result  (7), 
or  one  of  the  forms  (8)  -  (ll),  it  is  neces- 
sary to  have  some  knowledge  about  the  moments, 
both  simple  and  joint,  which  appear  in  those 
formulae.   This  information  may  be  obtainable 
from  one  of  the  following  sources.   (in  case 
none  of  these  methods  can  be  used,  it  would 
apparently  have  to  be  admitted  that  the  infor- 
mation available  is  not  sufficient  to  answer 
the  question  originally  posed.) 

(a)   Estimate  all  moments  from  a  sample  of 


the  original  data,  (X,Y),  or  from  a 
joint  distribution  thought  to  be 
similar  to  the  original  one. 

(b)  Estimate  the  simple  moments  from 
the  sample  quantiles,  if  these  are 
available  (see  for  example  Benson 
(1949)). 

(c)  Make  reasonable  guesses  of  the 
needed  moments . 
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THE  APPLICATION  OF  A  LINKAGE  MODEL  TO  THE  CHANDRASEKAR-DEMING  TECHNIQUE 

FOR  ESTIMATING  VITAL  EVENTS 

Benjamin  J.  Tepping 


The  Chandrasekar-Deming  technique  [l]  for 
estimating  the  number  of  vital  events  depends 
upon  determining  the  number  of  events  reported 
by  both  of  two  sources  (for  example,  a 
sample  survey  and  an  official  registration 
system).   Thus,  the  matching  of  events 
reported  on  two  lists  is  critical.   Jabine 
and  Bershad  [2],  Marks  [3]  and  Seltzer  and 
Adlakha  [4.]  have  discussed  the  effect  of 
matching  errors  on  the  bias  of  the 
Chandrasekar-Deming  estimator.   The  present 
writer  [5]  has  presented  a  general  model 
for  the  linkage  of  records.   This  note  is  an 
attempt  to  apply  that  model  to  the  problem  of 
matching  in  the  Chandrasekar-Deming  technique, 
and  especially  to  suggest  an  appropriate 
utility  function  as  required  by  the  model. 
It  is  assumed  in  the  sequel  that  the  reader 
is  familiar  with  [5]. 

Assume  that  for  any  pair  of  events  the 
possible  actions  are: 

&i  :   link 

a2  :   nonlink 

% :   field  check  of  the  pair,  resulting 
in  treatment  of  the  pair  as 
link  or  nonlink,  respectively 
sub-actions  aaj,  or  a$s  . 


We  assume  that  a^j   and  a32  are  stochastic,  so 
that  we  may  speak  of  P(a3l|M,Y)  P(aa2|M,Y), 
i.e.,  the  conditional  probabilities  given 
that  a  pair  subjected  to  a  field  check  is 
truly  a  match  and  has  the  comparison  vector  Y, 
that  the  pair  will  be  treated  as  a  link  or  a 
nonlink.   Let  the  cost  of  conducting  the 
field  check  for  a  single  pair  be  K.   Then 
the  utilities  of  the  several  actions  may  be 
written: 

U(aj|Y)  =  U(ai|M)P(M|Y)  +  U(ax  |5)P(M|y) 
U(a2|Y)  =  U(a2|M)P(M|Y)  +  U(a2  |M)P(M| y) 
U(a3|Y)  =  U(a3|M)P(M|Y)  +  U(a3  |M)P(M|  y) 
Further, 

U(a3  |M)=U(ax  |M)P(a3  l  |M,  Y)+U(a2  |M)P(a32  |M,  y)-K 
U(a3  |M)=U(aj  |M)P(a3  x  |M,  y)+U  (a3  |M)P(a32  |M»-£ 
We  note  that 

P(m|y)  =  1  -  P(m|y)  =  x(Y),  say 
P(a31|M,Y)  =  1  -  P(a32|M,Y)  =  y(Y),  say 
P(a^2|M,Y)  =  1  -  P(a3l|M,Y)  =  z(y),  say. 
We  then  have 

U(aj  |y)  =  U(ajM)x  +  U(ai |M) (l-x) 

U(a2 |y)  =  U(a2 |M)x  +  U(a2 |M)  (l-x) 

U(a3  |y)  =  [U(ax  |M)y  +  U(a2 |M) (l-y)  -  k]x 

+  [u(ax |M) (l-z)  +  U(a2 |M)z-K](l-x) 
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Simplifying  somewhat,  we  get 

U(ax  |Y)=[u(a1 |M)-U(ai |M) Jx+U(ai |M) 

U(a2  |Y)=[u(aa  |M)-U(a2  |M)jx+U(a2  |M) 

U(a3  |Y)=[{u(a1  |M)-U(a2  |MJy+(u(al  |M)-U(a3  |5)}s 

+U(a2|M)-U(ai|M) 

U(a2  |M)-U(aa  |M)   z  +  U(ai  |M)  -  K 

i — 

Let  us  denote 

Cn  =  U(ai|M)    Cl2  =  U(ajM) 

G31  =  U(a2|M)    C22  =  U(a2|M) 
Then 

UU^y)  =  (CirC18)x  +  C12 

U(a3|y)  =  (C21-C22)x  +  C22 

U(a3|Y)  =  [(Cu-C21)y  +  (C12-G22)z  +  C31-Cl2Jx 

-  (C12-C22)z  +  Cls  -  K 
The  utility  surfaces  for  ax   and  a2  are 
hyperplanes  over  the  (x,y, z) -space  but  the 
utility  surface  for  a3  is  hyperbolic.   The 
unit  cube  in  the  (x,y, z)-space  is  divided 
into  3  decision  regions  according  as 
U(a1|y),  U(a2|y)  or  U(a3  |y)  is  the  greatest. 
Now 
UUxIy)  <U(a2|y)  iff  (C11-C12-C21+C22)x 

+  Cia— C22  <   0 

U(ai|Y)  <U(a3|Y)  iff  [Cii-C2l-(CX1-C21)y 
-(C12-C22)z]x+(C12-C2a)z+K<  0 


U(a2|y)<  U(a3|y)   iff  [C12-C22-(CX1-C21)y 

-(C12-C22)zJx+(Cl2-C22)z 

+C22-C12+K  <  0 
From  these  inequalities  we  can  calculate  the 
decision  regions  in  the  unit  cube  for 
specified  values  of  C. . . 

Our  problem  now  is  to  specify  the  values 
of  the  utilities  C. .  and  the  cost  K  of  making 
a  field  check  for  a  single  pair  of  events. 
Without  loss  of  generality  we  may  take  K=l, 
thus  specifying  the  C. .  in  units  of  the  cost 
of  a  field  check. 

Let  us  assume  that  the  matching  rule 
adopted  links  each  event  contained  in  the 
sample  with  no  more  than  one  event  contained 
in  the  register.   Let  a'  denote  the  number  of 
sample  events  linked  with  the  register,  and  b' 
the  number  of  sample  events  not  linked  with 
the  register.   If  the  sample  is  self -weighting, 
the  estimated  number  of  matching  events  is  ka' 
and  the  estimated  number  of  non-matching 
events  (i.e.,  the  number  of  events  discoverable 
by  the  sample  survey  process,  but  not  con- 
tained in  the  register)  is  kb' .   Let  R  denote 
the  total  number  of  events  contained  in  the 
register,  so  that  the  CD  estimate  of  total 
events  in  the  population  may  be  written 
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R(a'+b' 


Rn 


where  n  =  a'+b'  denotes  the  number  of  sample 
events.   We  may  write 

a'  =  nu  +  n12l  +  n133 

b'  =  n21  +  n22 

where 

nxl  =  number  of  sample  events  linked  with 
a  matching  event  of  the  register 

ni21=  number  of  sample  events  linked  with 
a  non-matching  event  of  the  register, 
but  a  matching  event  exists  in  the 
register 

ni22=  number  of  sample  events  linked  with 
a  non-matching  event  of  the  register, 
and  no  matching  event  exists  in  the 
register 

n21  =  number  of  sample  events  not  linked 
with  any  event  of  the  register,  but 
a  matching  event  exists  in  the 
register 

n22  =  number  of  sample  events  not  linked 
with  any  event  of  the  register,  and 
no  matching  event  exists  in  the 
register. 


Now, 


Rn 


If  the  true  matches  and  nonmatches  in  the 
sample  were  known,  the  CD  estimate  would  be 

xn   -  Rn 

nn+n121+n21 

so  that  the  error  of  x'  resulting  from 
linkage  errors  is 

n31"n133 


i  _^n  = 


x'-x 


Rn 


(nll+n131+n123)  (n^n^-h^i) 
and  the  relative  error  of  x'  resulting  from 
linkage  errors  is 


_  X ' -X"  _   n31~n133 

x"    nxl+n121+n21 
It  may  be  noted  that  e  is  zero  if  n21  = 
n122,  as  would  be  expected. 

For  a  fixed  value  of  n  we  may  replace  one 
of  the  five  variables  by  a  function  of  the 
other  four,  say 

n2S  =  f(nli}nl3l,nlS3,n3l) 
and  then  write 


3e 


n3i~n133/ 


on 
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de 


Bnx 


21 


be 
Snl£ 


de 
Sn21 


(nll+n13l+nsl)'- 


n2 1-^123 

(nn+n121+n2i; 


nn+ni21+n2l 


n,  i  +n 


+n. 


UT"l31TIHSS 

(nn+n121+n2l. 


These  expressions  for  the  derivatives  suggest 
that  the  rate  of  change  of  e  with  respect  to 
nxi  and  nl21   are  negligible,  and  that 
(except  for  the  direction  of  the  error)  the 
rate  of  change  of  e  with  respect  to  nj.a2  and 
n21   are  roughly  the  same,  for 


de    ,     n3i~ni33 


Sn2l 


Sn12S  L    nn+nj. 


3 1 +n2 1 


If  we  take  negative  and  positive  errors  to  be 
equally  important,  we  may  take 

Cn  =  C22  =  0 

Ci2  =  C21  =  -V 
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where  V  is  the  amount  that  one  judges  he 
should  be  willing  to  pay  (in  units  of  the  cost 
of  a  field  check  for  one  event)  for  a  reduc- 
tion pf  —  in  the  relative  error  of  the 
m 

estimate,  m  being  the  number  of  sample  events 
which  are  also  contained  in  the  register. 

We  may  now  determine  the  decision  regions 
by  the  conditions 

U(aa  |y)<  U(a3|y)   iff  x<  1/2 
U(ai|Y)<U(a3|Y)   iff  *<ff^ 

U(a2|Y)<U(a3|Y)   iff  ^^^ 

Thus  the  actions  a.x ,    a2 ,  a3  are  indicated  by 
the  conditions 


ai :  x  >  max 

a3 :  x  <  min 


1/2, 


i-iA 


1-y+z  J 

L  v2,  £*£*  ] 

a3  :  all  other  values  of  x. 
The  contours  of  the  surfaces  which  bound  the 
decision  regions  are  shown  in  the  graphs  for 
selected  values  of  V.   For  example,  if  V=20, 
y=.8,  and  z=.9, 

the  pair  should  be  a  link  if      x  >  .77 
the  pair  should  be  a  nonlink  if   x  <  .17 
the  pair  should  be  field-checked  if 
.17  <  x  <  .77. 


It  is  of  interest  to  note  that  the 
decision  regions  are  only  mildly  sensitive  to 
the  value  V.   Thus  if  V=J+0   in  the  example 
above,  a  field  check  would  take  place  if 
.14-  <  x  <  .80,  while  if  V=10  the  interval 
would  be  .22  <  x  5  .73. 
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THE  USE  OF  THE  PLOTTING  MACHINE  IN  PLOTTING  SCATTER  DIAGRAMS 

Benjamin  J.  Tepping 


The  plotting  machine  associated  with  the 
IBM  14-01  computer  in  the  Bureau  of  the  Census 
is  sometimes  used  to  plot  "scatter  diagrams," 
i.e.,  a  set  of  discrete  points  (x. ,  y. ) .   The 
points  are  presented  to  the  machine  in  a  spec- 
ified order,  and  the  machine  plots  the 
(i  +  l)-th  point  by  moving  the  plotting  pen  a 
distance  |x.   -  x. |  horizontally  and  then  a 
distance  |y.  ,  -  y. |vertically.   Thus,  the 
total  distance  moved  in  the  course  of  plot- 
ting n  points  is 
n-1 , 


T 


\ 


T,  -]    x.  .,  -  x.   +  y.  _,  _  y.  r 

.  ,  I '  l+l    1  '    ' J l+l   J 1 ' J 
1=1 

The  purpose  of  this  note  is  to  examine  the 
question  of  how  to  arrange  the  data  for  pres- 
entation to  the  plotter  so  as  to  minimize  the 
total  distance,  and  therefore  the  time  used  on 
the  plotter. 

RECOMMENDATION:  Sort  the  points  into  S 
"strips",  divide  the  range  of  y  into  S  equal 
intervals  and  classify  the  points  into  these 
intervals  according  to  the  value  of  y. .  Take 
for  S  an  integer  which  approximates 


n  b 
3  a 


where  b  is  the  total  range  of  y,  a  is  the 
average  range  of  x  within  a  strip,  and  n  is 
the  number  of  points  to  be  plotted.   Within 
each  strip,  sequence  the  points  according  to 


the  value  of  x. ,  ascending  and  descending  in 

alternate  strips. 

Example : 

n  =  120 
b  =  5 
a  =  1 

S  =  -v/soo  =  14 

The  saving  that  can  be  accomplished  is  a 
function  primarily  of  the  number  of  points,  n, 
to  be  plotted.   The  following  table  illus- 
trates the  saving,  for  the  case  a  =  b,  in  the 
distance  travelled  to  traverse  n  points  sel- 
ected at  random  from  a  uniform  distribution. 


n 

Saving 

100 

S3  % 

200 

SS  % 

500 

92.2/ 

1000 

%.5/o 

2000 

96.1% 

5000 

97.6% 

DISCUSSION 

It  is  evident  that  the  length  of  a  path 
given  by  any  prescribed  rule  will  depend  upon 
the  distribution  of  the  points  to  be  plotted. 
We  assume  here  that  these  points  are  a  random 
sample  of  n  from  a  uniform  distribution  over 
the  rectangle 

0  <  x  <  a,     0  <  y  <  b. 
Case  1:   The  sequence  of  points  (x. ,  y.) 
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is  in  random  order.   The  total  length  of  path 
is  then 


n-1 
Si     =  E 
i=l 


x.  -  x.  ,  I  +  y.  -  y.  , 
1  1    l+l '    IJi  Jx+1 


traversed.  Let  n.  be  the  number  of  points  in 
the  j-th  strip,  and  let  (x.n  ,  y„  )  denote  the 
k-th  point  in  the  j-th  strip.  Then  the  total 
length  of  path  is 


But  since  x.  and  x.  _  are  a  random  sample  of 
i      l+l  r 

2  from  a  uniform  distribution  in  the  interval 
0  <x  <  a, 

E|xi  -  xi+ii  =  !• 


Similarly, 


Hence 


E'yi  -yi+i'  =  !• 


E  I    =  ^jka  +  b). 


Case  2:   The  sequence  of  points  (x. ,  y.) 

is  such  that  the  x.  are  the  order  statistics: 

i 


X;L<     x25 


<  x   . 
-     n 


Then,   again, 


E    \7i  -7i+1l   =  f 


s  n.-l      , 

i  =  f=1^Slxi>*-xJ>^  +  lyj,k-^-,k+i 


S-l 

E  i    x.  -  x.  _    _  I  +    ly.  -  y.  .,    , 

3+1,1 '        lorJ,nj       yj+l,l 


1  ,n, 
3=1       J' 


As  before, 


E   (|x.        _  x.   .    .  |) 
L    ■   j,k         j,k+l|V 


'("'j.k-'j.b.ilJ 


n. 


n.+l 


njJ  =     —  • 


Moreover, 

i{(I*j,«j-*m,i|)Ibj]=  1 

For  the  moment  let  us  define 


d.   =  E 
3 


(  Ix.  -  x.    _    n  I) 

1    j,n  3+1,1 


n.,n.   .   I 
3     3+1-1 


but 


Hence 


E     x.    -  x.    . 
1    l  l+l 


E    £    =      (n-1) 


a 

n+1   ' 


a 

.n+1 


b~l 

3J 


Case  3:      The  points    (x. ,   y.)    are   classi- 
fied into  S   strips,    each  of  width  b/S.     The 
point   (x. ,   y. )    is   in  the   j-th  strip  if 


Then 
E 


E       E  i 7  +  t_     +     E  i-sr  +  d . 

.   -,    ,    -,  Ln.+l       3SJ         .   n  IS  i 

3=1  k=l     j  3=1  J 


E     -I 


,  Ln.+l 
3=1     3 


n.-l  (n.-l)b,        o   ,  S-l 

I  +£=ib+   z    d 


a  + 


13" J  S 


3=1 


5-1 


S     n.-l       , 

=     a     E    -J-—  +  ^„  (n  +  2S   -  3)   +     E  d. 

.    -,    n  .+1       3S  .   _  1 

3=1     3  3=1  J 


U-Db  <  y     <lb 

Within  each  strip,  the  points  are  ordered  ac- 
cording to  the  value  of  x. ,  ascending  and  de- 
scending in  alternate  strips.   This  defines 
the  sequence  in  which  the  points  are 


Hence 

S   n.-l  S-l 

E  I   =  a  E  E  -*L— ■  +  -2  (n  +  2S  _  3)  +  E  Ed. 
._-,   n.+l   jib  .  -,     i 

3-1   3  3=1    J 

n.-l 

=  a  S  E  -J-sr  +  4,(n  +  2S  -  3)  +  (S-l)  E  d. 
n+1   3o  j 
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Since  n.  is  binomially  distributed  with 
J 

parameters  n  and  ^,  it  can  be  shown  that 


n.-l 


n.+l 
3 


2S 

n+1 


i  -  (%i) 


n+1^ 


-n+1 

2s  r,  S 

=     1  -  — ,    1  -  e 
n+lL 


I*-    sl 


=     1  - 


2S 

n+1 


To  evaluate  d.,  we  may  without  loss  of 

J 

generality  take  x .    and  x      to  be  the 
s       J  J,n.      3+1,1 

maxima  in  the  j-th  and  (j+l)-th  strips  respec- 
tively.  It  can  be  shown  (See  Appendix  A)  that 


where 


a  3 


Thus  S  is  a  solution  of  the  cubic  equation 


2   .3   Ss 


k(n+l)    "  k    n-1 


which  depends  on  n  and  the  ratio  b/a,  or 


—  S3  -  S2 


n+1 


We  observe  that 


n-1 


k  =  0, 


d2 
dS^ 


El     =  a 


2 (n-1)    2Kl 


n(n+l) 


I 


d 


n . 


J 


n.+n.  n+lM+n.  _    1+n . 
J   3+1      3+1      J 


Then  for  reasonably  large  values  of  n,  an 
approximation  is 


E  d 


2n 


x  2 


+  1 


n 
S 

1+ 


2a 


(§  +  |)  (S+nI 


1  n 

1  +  s 


aS 

n 


Thus 

Ei     = 


aS(l  -  Hj.)  +  ^|(n  +  2S  -  3)  +  aS^~1) 


Now  to  find  the  value  of  S  which  minimizes 
EH,   we  set 

d 


0     = 


d  S 


vc  -   „        j  aS        bn  b  a(2S-l) 

HjX    —    a    —   =-    —   -rrr>3     +   — g     +   

n+1       3S2        S3  n 


n-1  2 (n-1)  b  n-3  1 

_"n        "     nCn+1)  a     3     S3. 


_alC     n     S2Ln+l     k 


,3        S2  n  1 


n-lJ 


which  is  positive  if  and  only  if 


q3  ^  b  n(n+l) (n-3) 
b  <a     3 (n-1) 


But  the  derivative  of  the  left-hand  side  of 

the  cubic  equation  to  be  solved  is  2S( — r  -1) , 

^  n+1     ' 

so  that  one  of  the  two  positive  solutions  of 
the  cubic  equation  is  less  than  (n+l)/3  and 
the  other  is  greater  than  (n+l)/3.   For  values 
of  b/a  and  n  to  be  used  in  practice, 


,n+l,3        b     n( n+1) (n-3) 
LT)      >a  3  (n-1) 


since 


(n+1)a   >  b     9nin=3l 
a  n-1 


and  certainly  b/a  <  5  so  that  the  inequality 
will  be  satisfied  when  n  >  U5 .   Hence  the 
minimum  sought  is  the  smaller  of  the  two 
positive  roots  of  the  cubic. 

For  the  usual  values  of  n,  the  value 

S 


n-1       n-1  a  3 
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is  a  good  approximation  to  the  root  we  seek. 
(See  appendix  B  for  examples.)   An  easy  ap- 
proximation to  use  is 


S  = 


Jf 


3   a 

The  accompanying  graph  illustrates  the 
saving  that  may  be  achieved  as  compared  with 
plotting  random,  unsequenced  points.   For 
large  n  (say  n  >  100 ) ,  the  expected  length  of 
path  for  unsequenced  points  is  approximately 
2an/3,  where  a  is  the  side  of  the  square 
within  which  the  population  of  points  is 
uniformly  distributed.   If  the  points  are 
sequenced  within  s  strips,  with  s  =  yn/3>  the 
expected  length  of  path  is  approximately 
2a  rJa/3'      The  ratio  of  lengths  is  therefore 
approximately  yn/3. 

The  graph  also  illustrates  the  relative 
insensitivity  of  the  length  to  small  in- 
creases in  the  value  of  s  that  is  used. 

The  discussion  above  has  been  for  the  case 

of  points  distributed  uniformly  over  a 

rectangle.   The  recommendation  with  which 

this  note  begins  is  based  on  the  following 

argument.   For  a  general  distribution  of 

points,  let  the  range  of  y  be  divided  in  S 

strips.   If  S  is  so  chosen  that 
o 


/n  b~ 


a  strip,  it  will  usually  be  found  In  practice 
that  the  points  within  a  strip  are  much  more 
nearly  distributed  uniformly  over  a  rectangle 
than  the  aggregate  of  all  the  points.   Within 
a  strip,  in  such  case,  the  optimum  number  of 
substrips  in  the  average  strip  would  be  ap- 
proximately 


nTs   b/S 
o o 

3    R 


This  leads  to  the  recommendation. 
APPENDIX  A 

The  density  function  of  the  maximum  in  a 

sample  of  n  from  a  uniform  distribution  in 

n-1 
(0,1)  is  h(u)  =  n  u   .   Let  v  be  the  maximum 

in  an  independent  sample  of  size  m,  so  that 

g(v)  =  m  v   .   Let  w  =  u  -  v.   Then 

I       h(u)g(v)du  dv,  -  1  <  w  <  0 
v=-w  u=o 


i-w  ^ 

F(w)  ={l  -  J    J  h(u)g(v)du  dv,  0  <  w  <  1 
v=o  u=w+v 


0 


elsewhere 


Substituting  the  explicit  forms  of  g(v)  and 
h(u),  and  integrating  with  respect  to  u,  we  get 
/    ^   m-1 


J 
v=-w 

!-W 


,w+v)      dv, 


-  1  <  w  <  0 


F(w)  =<l-m  J  vm_1[l  -  (w+v)n]  dv,  0  <  w  <  1 


J 
v=o 


elsewhere 


o   v  3  R 


V 

Therefore, 


where  R  is  the  average  range  of  x  within  a 
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f  (w)  =  f»  u; 


=< 


/  •     m-±    ,         vii-i    ,  .      .         .  _ 

'  m  n         v  (w+v)  dv,    -  1  <  w  <  0 

v=-w 

i-w 

n         v      "    (w+v)    '      dv,   0  <  w  <  1 


v=o 


elsewhere 


Now 


E  |w  |   =    ,    (-w)    f(w)    dw  +    J   w  f(w)    dw 


w=-l 


w=o 


i       _i-w 


i*       m-1    /        Nn-1    , 
I   wf  (w)dw  =  m  n    |    w  j      v  (w+v  J  dv 

w-o  w=o  v=o 


=  m 


i        i-w     n-1     . 
f       *,    m-1   Wn-L  r  n-l-r,      , 
n    I        jwv         E(   r  )w  v         dv  dw 

w=o  v=o  r=o 


-w 


n-1^-!    r1       "XVlrnrta-2-r 
r=o  w=o  v=o 


m  n     2   (   r J  J        J   w     v  dv  dw 


m  n  E  (      )— - — = —    I   w      (1-wJ  dw 

r=o  o 


n-1       i         1       (r-B)!  (m+n-l-r)! 
m  n     E   (        )■ 


r=o 


r     m+n-l-r         (m+n+l)  ! 


*  n     L     (n~-r)l    <r+1)      (irti+1)! 

r=o 


n-l(n-l)_!  (x+m-1)  ! 

m  n     E         ;        (n-x)    /    ,    , :  ;  . 
x!  (m+n+l) ! 


x=o 


n-1 


n-1 


L  i^m,  E  (,«, 


x=o 


x=o 

n-1  ! 
=     E     -  A   (x+m-l)m 
x=o 


1  1      (n+m-1) ! 

=  -   (n+m-1 )        =  ~        /      ,  \  . 
Hi  'm.         m        (n-1)  ! 


n-1    /    .      . n .        n-1 
x=l  x=l 


n-1 
E       =4r  A   (x+m-1) 


x=l 


m+1 


1      (   m     ii  1      (n+m-1)! 

XT   (n+m-1 1      .    =  -rr   s/"9\( 


m+1 


'm+1       m+1      (n-2) 


Hence 


jwf(w)    dw 


w=o 


m  n!       In    (n+m-l) !  1      ( n+m-1 )  ! 

(nrti+]),    Iffi  -^fnTtfr   "  m+T     U-2ji 


mn 


r 


n       n-1 


;m+n) (m+n+l)     I  m  ~  m+1 


(m+1) (m+n+l) 

Similarly,  or  by  a  simple  argument  of 
symmetry, 


m  n! 

"(m+n+l)  ! 


-1    ,        ,    (x+m-1)!  S  ("w)f(w)    dW  =   (n+lMm+n+l) 


E     (n-x) 


x=o 


X! 


Therefore 


mn!      f     n-1  (x+m_i)  j     n-1    (x+m,i)i 


X! 


Tx=IjT 


(m+n+l)!        x=o  x=l 

Using  the  notation  of  the  calculus   of  finite 

differences, 


E|w|    =  _J_ 


TV, -1-1      I 


m+n+l  vn+l   m+1' 
Furthermore,  if  the  range  of  the  uniform 

distribution  is  a,  it  is  clear  that  the  ex- 
pected value  is  simply  multiplied  by  a. 
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APPENDIX  B 


Examples: 


I.   Take  b/a  =  1,  n  =  100.   Then  k  =  ^2 


III.   Take  b/a  =  1,  n  =  2000.   Then  k 


1997 


and  we  need  to  solve  the  cubic  equation 


:s)  =I§I  s3  .    ■ 


a  .100   97  = 

?  x  3   u' 


-M  =  _2_  3_  q  3  ,  2000  x  1997 
'    J        2001  3x  1999  ' 


Then 


The  approximation  suggested  is  S  -  10 .  /oZq'q 

=  5.714.887,  while  the  actual  root  is 

S=  6.094453- 

II.   Take  b/a  =2,  n  =  500.   Then  k  =  2|a. 


Then 


:s>  =  5§T"+^xm  =  , 


S  =  18.220794-  and  S  =  18.951800. 
o 


S  =  25.806969  and  S  =  26.1 


IV.   Take  b/a  =  5,  n  =  120.  Then 


k  =  5   x  117 
k     3 


Then 


:S)  ~  121  S  "  S2  +  119  X 


S  =  14-022791  and  S  =  16.430142, 
o 
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